Passive Sensors

Monocular camera

You might be surprised to know that a simple monocular color camera can be used to infer depth data of a given scene. While non-trivial, it is possible to use software techniques to calculate depth from multiple 2D images.

One such technique is Structure from Motion.
In this approach, multiple images of a given object or scene are taken from a single moving camera to reconstruct a 3D model from the resulting video stream. Depth is calculated via triangulation technique, which requires accurate measurement of camera pose throughout its movement.

Turns out it's possible to improve the Structure from Motion result by combining these triangulation techniques with depth inference from single images. Check out this awesome paper by Saxena, Chung and Ng 2007 for the details.

Stereo Camera

A stereo camera system consists of two monocular cameras separated by an accurately known distance. Depth information is obtained by comparing image frames obtained from both cameras viewing the same object or scene.

The difference between position of a given object in the scene as perceived by the two cameras is called disparity. Stereo cameras, much like human eyes, leverage this disparity to calculate the depth data associated with a given object.